Introduction

This file outlines the sampling strategy for Sierra Leone.

General Notes on Sampling

The aim of the Global Education Policy Dashboard school survey is to produce nationally representative estimates, which will be able to detect changes in the indicators over time at a minimum power of 80% and with a 0.05 significance level. We also wish to detect differences by urban/rural location.

For our school survey, we will employ a two-stage random sample design, where in the first stage a sample of around 200 schools, based on local conditions, is drawn, chosen in advance by the Bank staff. In the second stage, a sample of teachers and students will be drawn to answer questions from our survey modules, chosen in the field. A total of 10 teachers will be sampled for absenteeism. Five teachers will be interviewed and given a content knowledge exam. Three 1st grade students will be assessed at random, and a classroom of 4th grade students will be assessed at random. Stratification will be based on the school’s urban/rural classification and based on region. When stratifying by region, we will work with our partners within the country to make sure we include all relevant geographical divisions.

For our Survey of Public Officials, we will sample a total of 200 public officials. Roughly 40 officials will be surveyed at the federal level, while 160 officials will be surveyed at the reginoal/district level. For selection of officials at the regional and district level, we will employ a cluster sampling strategy, where 12 Governorate offices are chosen at random from among the regions in which schools were sampled. Then among these 12 Governorates, we also select at random 12 Directorates from among the Directorates in which schools werer sampled. The result of this sampling approach is that for 12 clusters we will have links from the school to the Directorates office to the Governorate office to the central office. Within the Governorates/Directorates five or six officials will be sampled, including the head of organization, HR director, two division directors from finance and planning, and one or two randomly selected professional employees among the finance, planning, and one other service related department chosen at random. At the federal level, we will interview the HR director, finance director, planning director, and three randomly selected service focused departments. In addition to the directors of each of these departments, a sample of 9 professional employees will be chosen in each department at random on the day of the interview.

Sampling Approach for Global Education Policy Dashboard

This document will provide an overview of the sampling strategy used in the Global Education Policy Dashboard (GEPD) surveys, as well as remaining questions. New data for the dashboard will be collected using three main instruments: a School Survey, an Expert Survey, and a Survey of Public Officials. More information pertaining to each can be found below. The goal of the Global Education Policy Dashboard is to provide summary information at the national level on a set of 35 indicators and to allow countries to track progress on those indicators over a short time frame (every 2 years). Specifically, we aim to produce nationally representative estimates, which will be able to detect changes in the indicators over time at a minimum power of 80% and with a 0.05 significance level. We also wish to disaggregate by urban/rural.

School Survey: The School Survey will collect data primarily on Practices (the quality of service delivery in schools), but also on some de facto Policy and school-level Politics indicators. It will consist of streamlined versions of existing instruments—including SDI and SABER SD on teachers, 4th grade students, and inputs/infrastructure, TEACH on pedagogical practice, GECDD on school readiness of young children, and DWMS on management quality—together with new questions to fill gaps in those instruments. Though the number of modules is similar to the full version of SDI, the number of items within each module is significantly lower. In each country, this survey will be administered in a nationally representative sample of 250 schools, selected through stratified random sampling. As currently envisioned, the School Survey will include 8 short modules. Expert Survey: The Expert Survey will collect information to feed into the policy indicators. This survey will be filled out by key informants in each country, drawing on their knowledge to identify key elements of the policy framework (as in the SABER approach to policy-data collection that the Bank has used over the past 7 years). The survey will have 4 modules with each including approximately ten questions.

Survey of Public Officials: The Survey of Public Officials will collect information about the capacity and orientation of the bureaucracy, as well as political factors affecting education outcomes. This survey will be a streamlined and education-focused version of the civil-servant surveys that the Bank’s Bureaucracy Lab has implemented recently in several countries, and the dashboard team is collaborating closely with DEC and Governance GP staff to develop this instrument. As currently envisioned, the survey will be administered to a random sample of about 200 staff serving in the central education ministry and district education offices. It will include questions about technical and leadership skills, work environment, stakeholder engagement, clientelism, and attitudes and behaviors.

Sierra Leone Specific Comments

Select the 40 additional schools

About the EGRA/EGMA sampling frame:

The sampling frame begun with the 2019 Annual School Census (ASC) list of primary schools as provided by UNICEF/MBSSE where the sample of 260 schools for this study were obtained from an initial list of 7,154 primary schools. Only schools that meet a pre-defined selection criteria were eligible for sampling.

To achieve the recommended sample size of 10 learners per grade, schools that had an enrolment of at least 30 learners in Grade 2 in 2019 were considered. To achieve a high level of confidence in the findings and generate enough data for analysis, the selection criteria only considered schools that: • had an enrolment of at least 30 learners in grade 1; and • had an active grade 4 in 2019 (enrolment not zero)

The sample was taken from a population of 4,597 primary schools that met the eligibility criteria above, representing 64.3% of all the 7,154 primary schools in Sierra Leone (as per the 2019 school census). Schools with higher numbers of learners were purposefully selected to ensure the sample size could be met in each site.

As a result, a sample of 260 schools were drawn using proportional to size allocation with simple random sampling without replacement in each stratum. In the population, there were 16 districts and five school ownership categories (community, government, mission/religious, private and others). A total of 63 strata were made by forming combinations of the 16 districts and school ownership categories. In each stratum, a sample size was computed proportional to the total population and samples were drawn randomly without replacement. Drawing from other EGRA/EGMA studies conducted by Montrose in the past, a backup sample of up to 78 schools (30% of the sample population) with which enumerator teams can replace sample schools was also be drawn.

In the distribution of sampled schools by ownership, majority of the sampled schools are owned by mission/religious group (62.7%, n=163) followed by the government owned schools at 18.5% (n=48). Additionally, in school distribution by district, majority of the sampled schools (54%) were found in Bo, Kambia, Kenema, Kono, Port Loko and Kailahun districts. Refer to annex 9. for details on the population and sample distribution by district.

Because of the restriction that at least 30 learners were available in Grade 2, we chose to add an additional 40 schools to the sample from among smaller schools, with between 3 and 30 grade 2 students. The objective of this supplement was to make the sample more nationally representative, as the restriction reduced the sampling frame for the EGRA/EGMA sample by over 1,500 schools from 7,154 to 4,597.

The 40 schools were chosen in a manner consistent with the original set of EGRA/EGMA schools. The 16 districts formed the strata. In each stratum, the number of schools selected were proportional to the total population of the statum, and within stratum schools were chosen with probability proportional to size.

pupil_count_max_threshold_g2 = 30
pupil_count_min_threshold_g1 = 3
pupil_count_min_threshold_g4 = 3
pupil_count_min_threshold_g6 = 3

df_gepd_normal_df <- df_selected %>%
  filter(sch_type=="B. Primary") %>%
  filter(class1_combined>=pupil_count_min_threshold_g1 & class4_combined>=pupil_count_min_threshold_g4) 
  
df_gepd_g6_df <- df_selected %>%
  filter(sch_type=="B. Primary") %>%
  filter(class1_combined>=pupil_count_min_threshold_g1 & class4_combined>=pupil_count_min_threshold_g4 & class6_combined>=pupil_count_min_threshold_g6) 
  
    

df_egra_frame <- df_selected %>%
  filter(sch_type=="B. Primary") %>%
  filter(class1_combined>=pupil_count_min_threshold_g1 & class4_combined>=pupil_count_min_threshold_g4) %>%
  filter(class2_combined>=pupil_count_max_threshold_g2)

df_updated <- df_selected %>%
  filter(sch_type=="B. Primary") %>%
  filter(class1_combined>=pupil_count_min_threshold_g1 & class4_combined>=pupil_count_min_threshold_g4) %>%
  filter(class2_combined<pupil_count_max_threshold_g2)

df_frame <- df_selected %>%
  filter(sch_type=="B. Primary") %>%
  filter(class1_combined>=pupil_count_min_threshold_g1 & class4_combined>=pupil_count_min_threshold_g4) 


df_updated %>% #   
  ungroup() %>%
    select(c(
    "sch_type",
    "accessibility",
    "mixed_school",
    "class1_combined",
    "class2_combined",
    "class3_combined",
    "class4_combined",
    "class5_combined",
    "class6_combined",
    "total_male_teachers",
    "total_female_teachers",
    "total_male_pupils",
    "total_female_pupils",
    "total_permanent_classrooms",
    "total_nonperm_classrooms",
    "total_teachers",
    "total_pupils",
    "total_classrooms",
    "sch_owner",
    "shift_status",
    "pri_lang_instruct",
    "electricity",
    "water_source",
    "sch_feeding",
    "internet",
    "computers"
  )) %>%
  skim() %>%
  filter(skim_type=="numeric") %>%
  select(-skim_type) %>%
  select(-contains('character')) %>%
  flextable() %>%
  add_header_lines("Summary Statistics of Sample Frame for Small Schools")
#in this code chunk, the number of schools to be selected in each stratum will be set.  This will be based on the population of the stratum.
# The strata are based on the 16 districts 


strata_size <- df_updated %>%
  group_by(idregion,iddistrict) %>%
  summarise(
    n_schools=n(),
    n_students=sum(class4_combined),
    share_students=n_students/sum(df_updated$class4_combined),
    exp_num_schools=40*share_students,
    ini_num_schools=round(exp_num_schools,0)
  ) %>%
  ungroup() %>%
  sample_n(nrow(.)) #randomly order the rows


#pick number of schools in each strata so that total adds to 40
# get vector of exp_num_schools
num_schools <- strata_size$ini_num_schools

#do a while loop sequentially randomly adding or subtracting a school to the list for a randomly selected strata until total is 40
while (sum(num_schools)!=40) {
  
  if (sum(num_schools)<40) {
    
    nr <- round(runif(n=1,min=1,max=16),0) #randomly choose integer between 0 and 16
    extra_schools <- rep(0,16) #create vector of 0s
    extra_schools[nr] <- 1 # for one random strata add one.
    num_schools <- num_schools + extra_schools #add to list of schools.
  } else {
    nr <- round(runif(n=1,min=1,max=16),0) #randomly choose integer between 0 and 16
    extra_schools <- rep(0,16) #create vector of 0s
    extra_schools[nr] <- -1 # for one random strata add one.
    num_schools <- num_schools + extra_schools #add to list of schools.
  }
  
}


sum(num_schools)
## [1] 40
strata_size$num_schools <- num_schools

sum(strata_size$chosen_num_schools)
## [1] 0
# There are 34 schools, so we need to choose 6 more randomly
districts <- strata_size %>%
  group_by(idregion,iddistrict) %>%
  summarise(num_schools=sum(num_schools),
            n_students=sum(n_students))
#now choose the schools within strata
school_sample_df <- df_updated %>%
  left_join(strata_size) %>%
  group_by(idregion,iddistrict) %>%
  sample_n(size=num_schools, weight=class4_combined)

sample_combined <- school_sample_df %>%
  mutate(School="Supplemental School") %>%
  bind_rows(egra_join_df)

write_excel_csv(sample_combined, paste(sampling_folder, '/sample_schools_', Sys.Date(),  '.csv', sep=""))

school_sample_df %>% 
  ungroup() %>%
    select(c(
    "sch_type",
    "accessibility",
    "mixed_school",
    "class1_combined",
    "class2_combined",
    "class3_combined",
    "class4_combined",
    "class5_combined",
    "class6_combined",
    "total_male_teachers",
    "total_female_teachers",
    "total_male_pupils",
    "total_female_pupils",
    "total_permanent_classrooms",
    "total_nonperm_classrooms",
    "total_teachers",
    "total_pupils",
    "total_classrooms",
    "sch_owner",
    "shift_status",
    "pri_lang_instruct",
    "electricity",
    "water_source",
    "sch_feeding",
    "internet",
    "computers"
  )) %>%
  skim() %>%
  filter(skim_type=="numeric") %>%
  select(-skim_type) %>%
  select(-contains('character')) %>%
  flextable() %>%
  add_header_lines('Summary Statistics of Supplemental Sample')

Replacement Schools

Below is a list of replacement schools for each sampled school. Replacement schools were randomly selected among the set of schools in the district, not including the orginally sampled schools. Each row contains the school name, location, and other information for each replacement school. In the final 5 columns of the database is the school code, school name, region, and district of the originally sampled school for which this school serves as a replacement.